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1. INTRODUCTION 

One of the most popular data analytics tools is sentiment analysis since it is frequently usedin 
different public as well as private sectors in the form of product/services survey or through social media like 
twitter data analysis. Usually the people or the communities place their opinions their websites 
forums,emails, blog forums,public blogs,and depict their opinions about the product, business processes or 
the decisions. Those opinions might be termed aspositive, negative or neutral about the indicative domains. 
Thesentiment analysis helps the targeted agencies to analyze people’s reaction towards product usage and 
quality and future creations in the industries. Nowadays, world wide web is playing a vital role in the 
sentiment analytics because people generally like to share their own opinions through the websites or web 
blogs and allow their sentiments to be stored in the form of textual data. Sentiment expression analysis has 
become important since based on these expressions, individuals or the businesses can update or access their 
final verdict or reviews [1]. Sentiment analysis is mostly used to classify these textual reviews through some 
machine learning methodologies powered by natural language processing tools in the forms of probability 
scores of neutral, positive or negative classes. 
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Typically, sentiment analysis employs classifiers utilizing machine learning and lexicon related 
methodologies. Lexicon orientedmethodology is word reference and corpus based approach which calculates 
the polarity or orientation of words through bag of words representation. Both suervised and unsupervised 
machine learning methods are supposed to be higly reliable in strongly categorizing and forecasting the 
sentiments as either neutral or negative or positive sentiments. Supervised approach inputs the labeled dataset 
along with its corresponding assigned sentiment classes during training wheras unsupervised methods use 
datasets without any labels [2] and sentiments are not pre-classified with its own labelled data. 

Supervised and unsupervised machine learning essentially square measure an outline of how during 
the training process we let the machines to analyze the provided labelled information set. In supervised 
learning approach, the system is provided with the outcome of the algorithm and all the system needs to do is 
to figure out the steps to reach to that outcome during the learning phase. In case of unsupervised learning the 
system is not made aware of the outcome of individual data items and due to this fact the input data mostly is 
not concrete and due to this unsupervised learning remains challenging. 

The classification process of opinion mining may be staged at three different echelons which speak 
out to be at sentence level,document level, or at object-oriented level. This research effort is focused at 
document based classification of subjectivity of text extracted from overall text from the single text 
document. Here, initially unstructured reviews of a movie are converted into an organized arrangement so as 
to extract the features. Then a correspondingrank score based upon the extracted features is found outto 
labeled word arrangements. Then the rank score is fed to the support vector machine (SVM) classifier to 
predict the sentiment conveyed through the text as neural, negative or positive. 


2. RELATED WORK 

Multiple feature extraction methods like bigrams,unigrams, or combination of both, combinations of 
unigrams and POS labeling of POS, unigrams, and location are taken into account. The machine learning 
based supervised classification techniques like bayes, logical regression, and SVM algorithms are applied on 
thesepreprocessed data. Human prediction is not better compared to machine learning classification 
algorithms [3]. Fuzzy based classification followed by tokenization, term frequency-inverse document 
frequency (TF-IDF) [4], stop word removal and POS tagging appliedat preprocessing stage before this 
method, improved the performanceof the system trained for movie reviews dataset. 

The study reports that machine learning related algorithms has provided good classification results 
with accuracy of above 85% when employed supervised training for emotion based datasets [5]. Film and 
Twitter surveys are classified utilizing WordNet with its POS by deriving words with the similar meaning in 
the same context and followed by assigning the corresponding polarity in SentiWordNet dictionary. The 
resultshows an increasedaccuracy by 7% using machine learning classifiers likeada boost, random forest, 
decision tree,WordNet synset [6]. They have proposed through a paper which is pointed towards usage of 
supervised learning methods which are more accurate and efficient than semantic orientation based 
techniques but at the same time, computation as well as time complexity is reportedly high [7]. Tweet 
textsentiment miningmodel consisted of the preprocessing, feature selection, and identification modules [8]. 

An enhanced feature selection method is developed during the preprocessing step, unwanted word 
removal, stemming, and pos tagging are performed. Feature selection methods such as mutual information 
(MI), information gain (IG), chi-square (A2), and TF-IDF are used to extract appropriate features [9]. Twitter 
sentiment analysis using binary cluster-based framework to map the related components in the tweet [10] 
helps the researcher to identify the twitter tweets, which includes implementation of ranking and the scoring 
of the features collected and reported as more accurate. The major advantage of this concept is to analyze the 
occurrence of a specific key-word pair. This process creates a model to understand the event handling in an 
effective manner. The process extracts and selects those feature variables which are required to train the 
model. The features being considered take the occurrence count and force the model to learn the 
combinations. This process is not supposed to work with the image based classification of the events and 
tweets. 

Ensemble modelling [11] helpsin better implementation of the sentimental analysis. This process is 
based on fuzzy logic implementation where the tokenization of the keywords is considered withpart-of- 
speech (POS) tagging of the keywords in the feature. Every word is considered as the feature and the POS 
defines the analysis of the model. Ensemble modelling helps to learn the model with multiple occurrences. 
The occurrences with all the possible fuzzy logics are handled and accordingly processed. In a fuzzy set, all 
the possible mathematical operations can be performed with the probability of occurrence of the event. The 
time-series methodology can be implemented using fuzzy logic. The dataset consists of the time frequencies 
of the tweets on a specific topic and the model analyzes the frequency and the word count (specific key- 
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word) related to the domain of targeted tweet. If the tweet in unreadable or in concise format, then model 
may not identify the tweet and the prediction fails. 

Statistical analysis [12] is the key concept in major implementations of sentimental analysis. Time 
series-based modelling ensures that frequency of occurrence of a tweet with specific key terms is measured 
and plotted. Time-series, M-model and autocorrection creates a path to predict some insights of the data 
using fuzzy logic. The statistical analysis makes users to understand the analysis of the event with the 
combinations on different standard based events. Statistical analysis takes the event in the form of time series 
with all possible data from the different time stamps. As mentioned Phan et al. in [11], here also, the tweets 
are calculated using the statistical modelling. The statistical modelling helps to analyze the intake of the 
tweets to the sever and in each server, the number of ways the analysis can be done, is predicted. The time- 
series models help to analyze the impact on intakes and predict the exact theory in the tweet. The same 
disadvantage as mentionedAhmad et al. in [12] is that it cannot be identified when there is unreadable format 
of tweet and also if the tweet is made from the virtual private network (VPN) based location. 

A systematic special and temporal sentiment analysis [13] creates the model based on the user 
mental stability. For every tweet, an internal mental stability of the user is associated. Based on the internal 
mind stability, the tweets are posted. In this article, researchers have performed sentimental analysis on 
twitter data using mental stability of the users. 

Geo-tag [14] based implementation have been helpful for the researchers to locate the user who is 
tweeting on a specific topic. Geo-tag reflects the motto of the tweet from a specific user. The location-based 
analysis helps the user to analyze the location-based tweets and the reason for happening of the tweets 
without the controversy. The tweet tag wars can be analyzed based on the location of tweets and this can help 
the stakeholders to monitor the issues on special events. 


3. PROPOSED SYSTEM 

The proposed method is explained is being as. The system consists of five major phases which 
arepreprocessing,feature extraction, feature normalization,feature selection, and classification. Here, we have 
used supervised learning approach. We have used two datasets for training the model, validating the classifier 
then finally the classifier classifies the input text based on training data. The radial basic K(x, y) function 
kernel is used here and optimization is also performed to increase the performance of the system. 

Step 1: Collection of online sentiment review dataset 

In this paper, we have used a polarity based movie review dataset. Record of separate content is kept 
up for every survey. Moreover, Twitter and Gold dataset are additionally produced to show results of 
proposed technique on various datasets. Twitter application programming interface (API) is used for taking 
the Twitter dataset and amazon website is used to collect the gold dataset. 

Step 2: Data-preprocessing of the obtained dataset 

All the Reviews contents are not found to be completely informative or directly expressing the 
significance of the opinion because it contains some contamination therefore preprocessing 1s very much 
important to remove those impurities. 

— Eliminate unwanted attentions: All attentions that genuinelycommunicate abundance are drained. 

— Removal of stopping words: Usage of some words are quite common in any language known as stop 
words. These stop words should be removed as a step of cleaning the textual data. These words do not 
create a significant impact upon the contextual or subjective meaning of the whole sentence. Samples of 
stop words hold i, a, are, is, an and so on. 

— Process of stemming: There are a lot of forms of a single word which are derivatively related and 
stemming is done to remove such affixes to the words to look like similar. 


— Porter stemmer algorithm is employed for effectively completing the word during stemming. It limits the 
list of variant forms of the words and makes useful grouping of these words. 

— During grammatical tagging, parts-of-speech (POS) of words may be used as a linguistics classification 
which is characterised by its syntactical or morphological conduct. Things, action words, modifiers, 
pronouns, relative words, combinations, and interposition area units fall under POS regular 
classifications. 

— POS labeling is basically denoting each word with its appropriate POS during grammatical tagging. Here 
we have used stanford POS tagger for this tagging procecss. 

— SentiWordNet is employed to provide sentiment scores to the tagged words which are used as an input to 
the SVM classifier to characterize reviews. The neutral, positive, and negative word scores are effectively 
characterised within the SentitWordNet lexicon. 

Step 3: Classification using enhanced SVM 
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After completion of preprocessingphase, the preprocessed dataset is fed as input to the SVM and 
Bayes classifiers for prediction of sentiments. We have tweaked the hyper plane parameters for better 
classification. SVM classification powered with the radial basic function kernel allows all the data to be 
spread over and therebythe center is chosen based on nearest support vector enroute to classifying the input 
data. In SVM the Hyper Plane is defined is being as by the relation: 


Zj KX) (3) 


where: Q: denotes the affine subspace, 

K (xj, x):1s the Kernal function. 
The following operation is performed to reduce the expression of the form by the soft margin of the SVM 
classifier: 


Eye max (0,1 — y,(w?x; — b))| + A\|w]|? (2) 


Step 4: Outcome 

The confusion matrix characterizes the performance of the system and allows us to figure out the 
errors if any, incurred during the classification process. It provides us the total number of correct and 
incorrect predictions done with the test data along with the total number of counts in each class. The number 
of negative cases predicted correctly, the number of positive cases predicted correctly, the number of actual 
negative cases predicted positively, and the number of actual positive cases predicted negatives, which are 
called true negatives, true positves, false positives, and false negatives correspondingly, are provided in the 
matrix which are used to calculate the overall accuracy of classification. It is termed as the best operational 
tool to analyze the system performance. 

Figure | depicts the complete process flow in the system. It is apparent that the textual data captured 
through tweets and reviews, is basically unstructured therefore we need to apply natural language text 
processing over the text as a part of preprocessing because it shall reduce the unwanted or noisy data and in 
turn make it homogeneous text and we shall be able to provide accurate data to the next level of processing. 
The input is preprocessed then the preprocessed data is passed to the feature extraction block then the feature 
normalizer performs the function of standardization and the best features are selected by the feature selector. 
Then this output is fed to the classifier in order to procure the decision of sentiment analysis. The Naive 
Bayes and SVM classifiers are used to classify the input data. The same classification was performed with 
optimized SVM and the scores were compared. 
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Figure 1. Proposed process flow 


The optimized SVM provides better feature extraction which makes it helpful to extract the 
important information from the given data so that the classifier can differentiate the data between the classes. 
Feature normalization also helps in avoiding the over fitting and redundancy. We use feature normalization to 
reduce the data into double precision and the feature selection to reduce the dimensionality and select the best 
features for further training the model. 


3.1. Feature normalisation 


Min-max standardization is one amongst the foremost common ways to normalize data. For each 
feature, the minimum cost of that feature gets reworked into a zero, the maximum cost gets reworked into 
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one, and thereby each different cost gets reduced into a decimal between zero and one. When we do 
normalization, all the high and low feature values are reduced between zero and one. For example, if the 
minimum cost of a feature was twenty, and the maximum cost was forty, then thirty would be reworked to 
regarding 0.5 since it's halfway between twenty and forty. 

Min-max social control has one fairly vital downside. It doesn't handle outliers alright, for instance, 
if you have got ninety nine values between zero and forty, and one cost is a hundred, then the ninety nine 
values can all be reduced to a worth between zero and one. That knowledge is simply as squished as before! 
Take a glance at the image below to visualize associate example of this. Figure 2 shows the min-max 
normalization of the word frequencies being normalized between 0 and 1. 
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Figure 2. Min-max normalizaition 


3.2. Feature selection 

Feature selection is generally used to select the best features in order to assist the classifier to take 
better decision by reducing the number of training data values. The feature selection refines the features and 
provide them to the learning phase. This helps to make classifier more accurate and efficient as the presence 
of some of the features do not provide much information and in such cases the feature selector remove those 
features and provide worthier features to the model and helps us to receive increased system accuracy. We 
have employed wrapper method for feature selection. 

In wrapper techniques, we tend to try and use a set of features and vicitimize the particularmodel 
under those selected features. Using Bi-directional elimination, we start with a null model and keep on 
adding a feature to it step wise using forward selection. Before adding a new feature, the significance of the 
existing feature is verified and if it is found insignificant, it is removed. The drawback of the method is that it 
is basically reduced to a hunt problem and sometimes becomes computationally terribly overpriced. Some 
common samples of wrapper ways include forward selection of features, elimination of features from 
backward, and algorithmic based elimination offeature. 

— Selection in forward: Forward choice is a repetitious methodology during which we tend to begin with 
having no feature within the model. Eachand every iteration, we tend to retain accumulating the feature 
that increases the accuracy of the model till a tally of a fresh parameter variable does not provide any 
improvement in the performance of the system. 

— Elimination from backward: In this backward exclusion context, we tend to begin with all options and 
eliminate the least volume of vital feature variable at each and every iteration that provides the best 
performance of the system model. 

— Recursive feature exclusion: It is a dynamic improvement rule that aims to seek out the worst performing 
feature set. It frequently forms the new model and keeps aside the worst performing feature at every 
repetition. This constructs the successive model with the left options till all the options are exhausted. 
Then it ranks the choices maintained from the order of their elimination. 

The easy and simple way for feature selection is the wrapper method as it provides the best or the 
optimal features with less computation time and finds the significanceof the each and every feature. 
Following steps are performed for its implementation. 
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a. First, it adds randomness to the given knowledge set by making shuffled copies of all options. 

b. Then, it trains with the help ofrandom forest classifier on the comprehensive knowledge set and applies a 
feature significance to judge the significance level of every feature, wherever higher marks that feature 
vital. 

c. At each iteration, it checks whether or not a true feature features a higher significance than the simplest of 
its shadow feature (i.e. whether or not the feature features a higher Z-score than the utmost Z-score of its 
shadow options) and perpetually removes features that the intended method deemed extremely 
unimportant. 

d. At the end, the rule stops either once all selections get includedor excluded or it reaches a maximum 
bound of random forest runs. 


4. CLASSIFICATION 

SVM is the one of foremost classification tools which can be used for binary classification as well 
as multi class problems. Binary classification is used for classifying the data in to two classes while multi- 
class can be utilized to classify the data in to more than two classes. Here, we have used supervised SVM 
method and employed two datasets for the training. After the model is trained the classifier is tested for it 
performance when put under test data. A radial basis function kernal is used and optimization is also 
performed to increase the performance of the system. 

SVM is a very useful system for grouping or classifying the labelled data. Before the classification 
task is intitiated, the whole information is divided into training and test data sets which comprise of a fixed 
percentage of data sets [15] respectively. Each case in the training set contains one objective output and a few 
characteristics. The objective of SVMis to deliver a model which predicts target estimation of data 
occurrences in the test set which comprises of unlabeled features only. 

Classification process by SVMis recognized as supervised knowledge based system. The output 
labels help in the test data demonstrate whether the framework is acting in a correct manner or not. The aim 
of SVM classification is to discover a hyperplane which is possibly a line, 2 dimensional (2D) or a 3D plane 
depending upon the number of outcome classes. 

SVM [16], [17] classifier after training finds the hyperplane, that sets the constraints a and b. This 
SVM has an alternative arrangement of parameters called hyper parameters [18], [19]: Gaussian radial basis 
kernel, the constant of soft marginal, C, and any constraints the centermay rely upon (widthor level of a 
kernels). In this paper, we show the effect of the hyper parameters on the boundary of a SVMutilizing two- 
dimensional models. For a huge estimation of C, a huge penalty is allocated to mistakes/edge blunders. This 
is found where the two nearest data points to the hyperplane affect its alignment, causing in a hyperplane 
which approaches a few other data values. The penalty parameter permits a specific level of 
misclassification, which is especially significant for non-detachable training sets. It gives a chance to control 
the exchange-off between permitting training blunders and compelling unbending edges. Expanding this 
worth additionally creates the expense of misclassifying targets and makes a progressive model that may not 
sum up well. A Threshold slider is utilized to show the degree of certainty that the nearest fragments of some 
random portion express a similar class as that section. Higher values mean more certainty, so just the closest 
portions are ordered. 

The dimensions are transformed to higher order for nonlinear data, where multiplication of test 
inputwith each and every support vector is performed. So no need ofnonlinear mapping is generated. Further 
process is similar to that of linear data case. The data points which are closer to the hyperplane, are used to 
maximize the distance between classes so that the future data points are classified correctly. The SVM 
classifier, utilized to categorizereviews,uses radial basis functionkernel and is adjusted by its hyper 
planeparameters with marginal constant and Gamma. So theenhanced SVMgives better outcomesas 
compared tolinear/non linear SVM, logical regressionand naivebayes classifier. The output performance of 
this proposed system provides the optimal result compared to other state of art methods. 


4.1. Classifiers 
4.1.1. Logistic regression 

Logistic regression, inspite of containing the term ‘Regression’ [20], is used for classification by 
employing a linear/non-linear regression curve to produce discrete outputs. It based on maximum probability 
estimation [21] andqualitative based modelselection. A threshold value is always which specifies the class to 
which a data case is espected to put into. Logistical regression can be used to construct the model for multi- 
classification problems too. 
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4.1.2. Naive Bayes 

Bayes theorem for conditional probability is used for classification by assigning class labels to test 
inputs which are nothing but some feature sets. The naïve bayes classifier acts on the principle that value of a 
feature is independent of the value of all other features for a given class. Naive Bayes classifieremploys the 
principle of maximum likehood [22] for parameter estimation. The classifier expects the data set in the form 
of a frequency table which is utilized to generate a likelihood table after the probability of each feature is 
calculated. The the Bayes theorem is applied to calculate the posterior probability. Because our review 
dataset is multinomial distributed [23], we have implemented multinomial naïve bayes classifier. 


4.1.3. Dataset 

We have used SentiWordNet dataset here. Discrete document is conserved for each andevery single 
review. Twitter gold datasetis additionally taken to indicate result of projected methodology on completely 
dissimilar dataset. Twitter API [24] used for extracting the Twitter dataset and amazon website is used to 
collect the gold dataset. The following pseudocode will illustrate the procedure of all the steps involved in 
this implementation.In the following algorithms the dataset is taken from the twitter API and coverted into 
dataset.csv file. Then API() and Img_set() was executed on the dataset to get the accuracy and performance 
measure of the algorithm. Using SKLearn library of python, the accuracy matrix was plotted. 


Twitter analysis (dl,d2): 
Def tal(): 
#grab the datasets from twitter API 
API (self, SecretKey, Authenticationkey) 
If (SecretKey == (Username, Password) ): 
Pd.write (“Dataset.csv”,”"w’) 
Else: 
Return 0 
Def ta2(): 
#grab the image datasets related to twitter analysis 
Img set (Username, Password): 
Authenticate (username, password) 
Return 0 


Ob1.API() 
Ob2.Img set () 
#Ob1 


#import SVM from SKLearn 
Plot Ob1.svm 
Plot Ob2.svm 
#import accuracy matrix from SKLearn 
Plot Accuracy matrix 
If( diff(y^,y)>= 0.5): 
Repeat SVM 
Else: 
Return 0 





5. RESULTS AND DISCUSSION 

Accuracy, sensitivity, and specificity [25] these are the three parameter we are consider for 
performance analysis. 
Accuracy: The accuracy can be calculated is being as: 





_ (TP+TN) 

Aco = (TP+TN+FP+FN) (3) 

Sensitivity: The sensitivity can be calculated is being as: 
TP 

SEN = (TP+FP) (4) 

Specificity: The specificity can be calculated is being as: 
TN 

spec = (TN+FP) (5) 
F1 Score: The fl-score can be calculated is being as: 

Fee? : (Precision * Recall) (6) 


(Precision+Recall) 
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Precision: The precision can be calculated is being as: 


TP 
= (TP+FP) (7) 


Recall:The recall can be calculated is being as: 


TP 
(TP+FN) 





(8) 


The Table 1 shows performance of proposed method which far better than others. Figure 3 shows a 
comparative analysis of three classfiers employed for sentiment subjectivity analysis. Accuracy show the 
training accuracy of the three models and F1-score [26] being a better estimator of as compared to precision 
and recall, depicts the validation accuracy. The validation accuracy being less than training accuracy clearly 
points out that our model has not overfitted. Also a validation accuracy of 94% by SVM classifier is 
substantially ahead of 90% and 87% as shown by Naïve Bayes and logistic regression respectively. A high 
senstiivity and specificity ratio attained by SVM also suggests that the model is able to correctly classify true 
positives and true negatives. 


Table 1. Performance with proposed method 


Method Accuracy Sensitivity Specificity Fl-Score Precision Recall 

Bayes 87 85 89 87 82 83 
Logistic Regression 91 89 93 90 90.65 92.45 
Proposed Methodology 97.5 96 97 94 96 93.01 


Performance With Proposed Method 
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Accuracy Sensitivity Specificity F1-Score Precision Recall 


Figure 3. The performance visualization 


Figure 4 demonstrates the receiver operator characteristic (ROC) curve [27] plotted for true positive 
rate versus false positive rate and it indicates the exact trade off between the sensitivity (TPR) and specificity 
(1-FPR). All the classifiers tend to be closer to the left corner specified by 1 TPR and SVM being the closest 
to the corner shows its improvement over others. The performance of the region of convergence provides the 
better AUC for proposed system which is better compared to other conventional methods. The Table 2 shows 
performance accuracy, sensitivity, specificity of proposed method which is far better than others with 70-30 
training and testing partition. 

Figure 5 shows the performance of the classifiers after a cross validation is performed with a split of 
70% train data and 30% test data. Figure 6 exhibits true positive rates, true negative rates, false positive rates 
and false negative rates from the confusion matrix by proposed classifer with that of other classifier. The table 
shows the true positive true negative, false positive and false negatives performance of proposed method s better 
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than others. The Table 3 shows the accuracy sensitivity and and specificity of proposed method higher than the 
others. Figure 7 shows the graphical representation of the classification summary parameters of the classifiers 
when modelled against some different real world reviews data from a different source. 
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Figure 4. ROC performance 


Table 2. performance with proposed method with 70-30 partition 


Method Accuracy Sensitivity Specificity Fl-Score Precision Recall 

Bayes 81 83 82 81 79.5 80.2 
Logistic Regression 89 87 91 81.36 82.12 80.98 
Proposed Methodology 96.5 94 96 95 94.5 96.9 
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Figure 5. Performance visualization with 70-30 partitions 
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Table 3. Performance with proposed method with real word data 


Method Accuracy Sensitivity Specificity Fl-Score Precision Recall 
Bayes 86 84 87 85 84 82 
Logistic Regression 91 88 92 94 91 91 
Proposed Methodology 96.5 95 96.5 95 94.5 96.9 


Performance with Real Word Data 
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Figure 7. Performance visualization with real world data 


6. CONCLUSION 

Sentiment mining is an important analysis to categorize the user or human opinions for the future 
predictions and valuable outcomes. Here we have designed a novel sentiment mining system to build a better 
performing system. Also we have developed an enhanced SVM algorithm for better classification by 
changing the hyper parameter values exploitingthe feature selection and feature normalization processes. 
Both feature normalization and feature selection prove to be very helpful for better classification of 
classifiers.We have used featurenormalization to reduce the data into double precision and the feature 
selection to select the optimal features for further processing.The output performance of this proposed system 
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provide the optimal result as compared to other state of art methods. So this method can be used to analyse 
the sentimental data in the real time environments since it provides the better accuracy compared to other 
conventional methods. 
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